85 research outputs found

    Enumerating the k closest pairs mechanically

    No full text
    Let SS be a set of nn points in DD-dimensional space, where DD is a constant, and let kk be an integer between 11 and (n2)n \choose 2. An algorithm is given that computes the kk closest pairs in the set SS in O(nlog⁥n+k)O(n \log n + k) time, using O(n+k)O(n+k) space. The algorithm fits in the algebraic decision tree model and is, therefore, optimal

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    {EDISON}-{WMW}: Exact Dynamic Programing Solution of the {Wilcoxon}-{Mann}-{Whitney} Test

    Get PDF
    In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/

    Algorithm engineering for optimal alignment of protein structure distance matrices

    Get PDF
    Protein structural alignment is an important problem in computational biology. In this paper, we present first successes on provably optimal pairwise alignment of protein inter-residue distance matrices, using the popular Dali scoring function. We introduce the structural alignment problem formally, which enables us to express a variety of scoring functions used in previous work as special cases in a unified framework. Further, we propose the first mathematical model for computing optimal structural alignments based on dense inter-residue distance matrices. We therefore reformulate the problem as a special graph problem and give a tight integer linear programming model. We then present algorithm engineering techniques to handle the huge integer linear programs of real-life distance matrix alignment problems. Applying these techniques, we can compute provably optimal Dali alignments for the very first time

    Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis

    Get PDF
    Background: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied. Results: We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case–control status – effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case–control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches. Conclusions: To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case–control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway

    Louse (Insecta : Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable

    Get PDF
    Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects

    Computation of significance scores of unweighted Gene Set Enrichment Analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.</p> <p>Results</p> <p>We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.</p

    Clinical predictors of long-term survival in newly diagnosed transplant eligible multiple myeloma - an IMWG Research Project

    Get PDF
    Purpose: multiple myeloma is considered an incurable hematologic cancer but a subset of patients can achieve long-term remissions and survival. The present study examines the clinical features of long-term survival as it correlates to depth of disease response. Patients & Methods: this was a multi-institutional, international, retrospective analysis of high-dose melphalan-autologous stem cell transplant (HDM-ASCT) eligible MM patients included in clinical trials. Clinical variable and survival data were collected from 7291 MM patients from Czech Republic, France, Germany, Italy, Korea, Spain, the Nordic Myeloma Study Group and the United States. Kaplan–Meier curves were used to assess progression-free survival (PFS) and overall survival (OS). Relative survival (RS) and statistical cure fractions (CF) were computed for all patients with available data. Results: achieving CR at 1 year was associated with superior PFS (median PFS 3.3 years vs. 2.6 years, p < 0.0001) as well as OS (median OS 8.5 years vs. 6.3 years, p < 0.0001). Clinical variables at diagnosis associated with 5-year survival and 10-year survival were compared with those associated with 2-year death. In multivariate analysis, age over 65 years (OR 1.87, p = 0.002), IgA Isotype (OR 1.53, p = 0.004), low albumin < 3.5 g/dL (OR = 1.36, p = 0.023), elevated beta 2 microglobulin ≄ 3.5 mg/dL (OR 1.86, p < 0.001), serum creatinine levels ≄ 2 mg/dL (OR 1.77, p = 0.005), hemoglobin levels < 10 g/dL (OR 1.55, p = 0.003), and platelet count < 150k/ÎŒL (OR 2.26, p < 0.001) appeared to be negatively associated with 10-year survival. The relative survival for the cohort was ~0.9, and the statistical cure fraction was 14.3%. Conclusions: these data identify CR as an important predictor of long-term survival for HDM-ASCT eligible MM patients. They also identify clinical variables reflective of higher disease burden as poor prognostic markers for long-term survival
    • 

    corecore